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Abstract 

Planned experiments are the gold standard in reliably comparing the causal effect 
of switching from a baseline policy to a new policy. One critical shortcoming of classi¬ 
cal experimental methods, however, is that they typically do not take into account the 
dynamic nature of response to policy changes. For instance, in an experiment where 
we seek to understand the effects of a new ad pricing policy on auction revenue, agents 
may adapt their bidding in response to the experimental pricing changes. Thus, causal 
effects of the new pricing policy after such adaptation period, the long-term causal 
effects , are not captured by the classical methodology even though they clearly are 
more indicative of the value of the new policy. Here, we formalize a framework to 
define and estimate long-term causal effects of policy changes in multiagent economies. 
Central to our approach is behavioral game theory, which we leverage to formulate the 
ignorability assumptions that are necessary for causal inference. Under such assump¬ 
tions we estimate long-term causal effects through a latent space approach, where a 
behavioral model of how agents act conditional on their latent behaviors is combined 
with a temporal model of how behaviors evolve over time. 
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1 Introduction 


A multiagent economy is comprised of agents interacting under specific economic rules. A 
common problem of interest is to experimentally evaluate changes to such rules, also known 
as treatments, on an objective of interest. For example, an online ad auction platform is 
a multiagent economy, where one problem is to estimate the effect of raising the reserve 
price on the platform’s revenue. Assessing causality of such effects is a challenging problem 
because there is a conceptual discrepancy between what needs to be estimated and what is 
available in the data, as illustrated in Figure [TJ 

What needs to be estimated is the causal effect of a policy change, which is defined as 
the difference between the objective value when the economy is treated, i.e., when all agents 
interact under the new rules, relative to when the same economy is in control, i.e., when all 
agents interact under the baseline rules. Such definition of causal effects is logically necessi¬ 
tated from the designer’s task, which is to select either the treatment or the control policy 
based on their estimated revenues, and then apply such policy to all agents in the economy. 
The long-term causal effect is the causal effect defined after the system has stabilized, and is 
more representative of the value of policy changes in dynamical systems. Thus, in Figure [l] 
the long-term causal effect is the difference between the objective values at the top and 
bottom endpoints, marked as the “targets of inference”. 

What is available in the experimental data, however, typically comes from designs such 
as the so-called A/B test, where we randomly assign some agents to the treated economy 
(new rules B) and the others to the control economy (baseline rules A), and then compare 
the outcomes. In Figure [T| the experimental data are depicted as the solid time-series in the 
middle of the plot, marked as the “observed data”. 

Therefore the challenge in estimating long-term causal effects is that we generally need 
to perform two inferential tasks simultaneously, namely, 


(i) infer outcomes across possible experimental assignments (y-axis in Figure [I]), and 

(ii) infer long-term outcomes from short-term experimental data (x-axis in Figure [l]). 


The first task is commonly known as the “fundamental problem of causal inference” (Hol¬ 


land, 1986 Rubin 2011) because it underscores the impossibility of observing in the same 


experiment the outcomes for both policy assignments that define the causal effect; i.e., that 
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assignments time 


Figure 1: The two inferential tasks for causal inference in multiagent economies. First, infer agent 
actions across treatment assignments (y-axis), particularly, the assignment where all agents are 
in the treated economy (top assignment, Z = 1), and the assignment where all agents are in the 
control economy (bottom assignment, Z = 0). Second, infer across time, from to (last observation 
time) to long-term T. What we seek in order to evaluate the causal effect of the new treatment is 
the difference between the objectives (e.g., revenue) at the two inferential target endpoints. 


we cannot observe in the same experiment both the outcomes when all agents are treated 
and the outcomes when all agents are in control, the assignments of which are denoted by 
Z = 1 and Z — 0, respectively, in Figure [lj In fact the role of experimental design, as 


conceived by Fisher (1935), is exactly to quantify the uncertainty about such causal effects 
that cannot be observed due to the aforementioned fundamental problem, by using standard 
errors that can be observed in a carefully designed experiment. 

The second task, however, is unique to causal inference in dynamical systems, such as 
the multiagent economies that we study in this paper, and has received limited attention so 
far. Here, we argue that it is crucial to study long-term causal effects, i.e., effects measured 
after the system has stabilized, because such effects are more representative of the value of 
policy changes. If our analysis focused only on the observed data part depicted in Figure [lj 
then policy evaluation would reflect transient effects that might differ substantially from the 
long-term effects. For instance, raising the reserve price in an auction might increase revenue 
in the short-term but as agents adapt their bids, or switch to another platform altogether, 


the long-term effect could be a net decrease in revenue (Holland and Miller, 1991). 
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1.1 Related work and our contributions 


There have been several important projects related to causal inference in multiagent economies. 
For instance, Ostrovsky and Schwarz (201l| evaluated the effects of an increase in the reserve 
price of Yahoo! ad auctions on revenue. Auctions were randomly assigned to an increased 
reserve price treatment, and the effect was estimated using difference-in-differences (DID), 


which is a popular econometric method (Card and Krueger, 1994; Donald and Lang, 2007 


Ostrovsky and Schwarz 2011). The DID method compares the difference in outcomes before 
and after the intervention for both the treated and control units —the ad auctions in this 
experiment— and then compares the two differences. In relation to Figure [lj DID extrap¬ 
olates across assignments (y-axis) and across time (x-axis) by making a strong additivity 


assumption (Abadie, 2005 Angrist and Pischke, 2008, Section 5.2), specifically, by assuming 


that the dependence of revenue on reserve price and time is additive. 


In a structural approach, Athey et al. (2011) studied the effects of auction format (as¬ 
cending versus sealed bid) on competition for timber tracts. Their approach was to estimate 
agent valuations from observed data (agent bids) in one auction format and then impute 
counterfactual bid distributions in the other auction format, under the assumption of equi¬ 
librium play in the observed data. In relation to Figure [lj their approach extrapolates across 
assignments by assuming that agent individual valuations for tracts are independent of the 
treatment assignment, and extrapolates across time by assuming that the observed agent bids 
are already in equilibrium. Similar approaches are followed in econometrics for estimation 


of general equilibrium effects (Heckman et al, 1998 Heckman and Vytlacil, 2005). 


In a causal graph approach ( [Pear l, 2000), Bottou et al. (2013) studied effects of changes in 
the algorithm that scores Bing ads on the ad platform’s revenue. Their approach was to create 
a directed acyclic graph (DAG) among related variables, such as queries, bids, and prices. 
Through a “Causal Markov” assumption they could predict counterfactuals for revenue, 
using only data from the control economy (observational study). In relation to Figure [lj 
their approach is non-experimental and extrapolates across assignments and across time by 
assuming a directed acyclic graph (DAG) as the correct data model, which is also assumed to 
be stable with respect to treatment assignment, and by estimating counterfactuals through 
the fitted model. 

Our work is different from prior work because it takes into account the short-term aspect 
of experimental data to evaluate long-term causal effects, which is the key conceptual and 
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practical challenge that arises in empirical applications. In contrast, classical econometric 
methods, such as DID, assume strong linear trends from short-term to long-term, whereas 
structural approaches typically assume that the experimental data are already long-term as 
they are observed in equilibrium. We refer the reader to Sections 2 and 3 of the supplement 
for more detailed comparisons. 

In summary, our key contribution is that we develop a formal framework that (i) artic¬ 
ulates the distinction between short-term and long-term causal effects, (ii) leverages behav¬ 
ioral game-theoretic models for causal analysis of multiagent economies, and (iiii) explicates 
theory that enables valid inference of long-term causal effects. 

2 Definitions 

Consider a set of agents X and a set of actions A, indexed by i and a, respectively. The 
experiment designer wants to run an experiment to evaluate a new policy against the baseline 
policy relative to an objective. In the experiment each agent is assigned to one policy, and 
the experimenter observes how agents act over time. Formally, let Z = (Zj) be the |Z| x 1 
assignment vector where Z t — 1 denotes that agent % is assigned to the new policy, and 
Z % = 0 denotes that i is assigned to the baseline policy; as a shorthand, Z = 1 denotes that 
all agents are assigned to the new policy, and Z — 0 denotes that all agents are assigned 
to the baseline policy, where 1, 0 generally denote an appropriately-sized vector of ones 
and zeroes, respectively. In the simplest case, the experiment is an A/B test, where Z is 
uniformly random on {0, 1}^ subject to JA Z i = |X|/2. 

After the initial assignment Z agents play actions at discrete time points from t — 0 
to t — to. Let Ai(t] Z) G A be the random variable that denotes the action of agent % 
at time t under assignment Z. The population action aj(t; Z) G A^l, where A p denotes 
the p-dimensional simplex, is the frequency of actions at time t under assignment Z of 
agents that were assigned to game j; for example, assuming two actions A = { 01 , 02 }, then 
ai(0 ;Z) = [0.2, 0.8] denotes that, under assignment Z, 20% of agents assigned to the new 
policy play action a 1 at t — 0, while the rest play a 2 . We assume that the objective value 
for the experimenter depends on the population action, in a similar way that, say, auction 
revenue depends on agents’ aggregate bidding. The objective value in policy j at time t 
under assignment Z is denoted by Z)), where R : A^l —» M. For instance, suppose 
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in the previous example that cq and a 2 produce revenue $10 and —$2, respectively, each time 
they are played, then R is linear and R([. 2, .8]) = 0.2 ■ $10 — 0.8 ■ $2 = $0.4. 

Definition 1. The average causal effect on objective R at time t of the new policy relative 
to the baseline is denoted by CE(£) and is defined as 


CE(t) = E (Riafit; 1)) - R(a 0 (fi 0))). 


( 1 ) 


Suppose that (to, T] is the time interval required for the economy to adapt to the ex¬ 
perimental conditions. The exact definition of T is important but we defer this discussion 


for Section 3.1 The designer concludes that the new policy is better than the baseline if 
CE(T) > 0. Thus, CE(T) is the long-term average causal effect and is a function of two 
objective values, R(a\(T; 1)) and R(ao(T ; 0)), which correspond to the two inferential target 
endpoints in Figure [TJ Neither value is observed in the experiment because agents are ran¬ 
domly split, between policies, and their actions are observed only for the short-term period 
[0, to]- Thus we need to (i) extrapolate across assignments by pivoting from the observed 
assignment to the counterfactuals Z — 1 and Z = 0; (ii) extrapolate across time from the 
short-term data [0, t 0 ) to the long-term t = T. We perform these two extrapolations based 
on a latent space approach, which is described next. 


2.1 Behavioral and temporal models 


We assume a latent behavioral model of how agents select actions, inspired by models from 
behavioral game theory. The behavioral model is used to predict agent actions conditional 
on agent behaviors, and is combined with a temporal model to predict behaviors in the 
long-term. The two models are ultimately used to estimate agent actions in the long-term, 
and thus estimate long-term causal effects. As the choice of the latent space is not unique, 
in Section |3.1 we discuss why we chose to use behavioral models from game theory. 

Let Bi(t\ Z) denote the behavior that agent i adopts at time t under experimental as¬ 
signment Z. The following assumption puts a constraints on the space of possible behaviors 
that agents can adopt, which will simplify the subsequent analysis. 


Assumption 1 (Finite set of possible behaviors). There is a fixed and finite set of behaviors 
B such that for every time t, assignment Z and agent i, it holds that Bfit] Z) e B; i.e., every 
agent can only adopt a behavior from B. 
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The set of possible behaviors B essentially defines a \B\ x |M| collection of probabilities that 
is sufficient to compute the likelihood of actions played conditional on adopted behavior—we 
refer to such collection as the behavioral model. 

Definition 2 (Behavioral model). The behavioral model for policy j defined by set B of 
behaviors is the collection of probabilities 

P{Ai(t] Z ) = a\Bfit; Z ) = b, Gj), 

for every action a G A and every behavior b e B, where Gj denotes the characteristics of 
policy j. 

As an example, a non-sophisticated behavior bo could imply that P(Ai(t] Z) = a\bo, Gj) = 
1/|M|, i.e., that the agent adopting b 0 simply plays actions at random. Conditioning on 
policy j in Definition [2] allows an agent to choose its actions based on expected payoffs, 
which depend on the policy characteristics. For instance, in the application of Section [4] we 
consider a behavioral model where an agent picks actions in a two-person game according to 
expected payoffs calculated from the game-specific payoff matrix—in that case Gj is simply 
the payoff matrix of game j. 

The population behavior f3j(t ; Z) e denotes the frequency at time t under assignment 
Z of the adopted behaviors of agents assigned to policy j. Let T t denote the entire history 
of population behaviors in the experiment up to time t. A temporal model of behaviors is 
defined as follows. 

Definition 3 (Temporal model). For an experimental assignment Z a temporal model for 
policy j is a collection of parameters <fj(Z), ifj(Z), and densities (tt, f), such that for all t, 

/3 J (0;Z)~ I r(-;^(Z)), 

ft(f;Z)l ■F i _i,G j ~/(-|Vyz),.F i _i). (2) 

A temporal model defines the distribution of population behavior as a time-series with 
a Markovian structure subject to tt and / being stable with respect to Z. In other words, 
regardless of how agents are assigned to games, the population behavior in the game will 
evolve according to a fixed model described by / and tt. The model parameters 0, if may 
still depend on the treatment assignment Z. 
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3 Estimation of long-term causal effects 

Here we develop the assumptions that are necessary for inference of long-term causal effects. 

Assumption 2 (Stability of initial behaviors). Let pz = proportion of 

agents assigned to the new policy under assignment Z. Then, for every possible Z, 

PzM 0; Z) + (1 - Pz)P o(0; Z) = /3 (0) , (3) 

where fd^ is a fixed population behavior invariant to Z. 

Assumption 3 (Behavioral ignorability). The assignment is independent of population be¬ 
havior at time t, conditional on policy and behavioral history up to t; i.e., for every t > 0 
and policy j, 


Z JLfrfrZ) | Ft-uGj. 

Remarks. Assumption [2] implies that the agents do not anticipate the assignment Z as 
they “have made up their minds” to adopt a population behavior (3 ^ before the experiment. 
It follows that the population behavior Z) marginally corresponds to pz\L\ draws from 
\B\ bins of total size \l\fd^f The bin selection probabilities at every draw depend on the 
experimental design; for instance, in an A/B experiment where pz = 0.5 the population 
behavior at t — 0 can be sampled uniformly such that /?i(0; Z) + /3o(0; Z) = 2 jd^\ Quantities 
such as that in Eq. ([3]) are crucial in causal inference because they can be used as a pivot 
for extrapolation across assignments. 

Assumption [3] states that the treatment assignment does not add information about the 
population behavior at time £, if we already know the full behavioral history of up to t, and 
the policy which agents are assigned to; hence, the treatment assignment is conditionally 
ignorable. This ignorability assumption precludes, for instance, an agent adopting a different 
behavior depending on whether it was assigned with friends or foes in the experiment. 

Algorithm [T| is the main methodological contribution of this paper. It is a Bayesian 
procedure as it puts priors on parameters of the temporal model, and then marginalizes 
these parameters out. 


Algorithm 1 Estimation of long-term causal effects. 

Input: Z, T, A , B, G u G 0 , V x = {a^t; Z) : t = 0,..., t 0 }, V 0 = { a 0 (t ; Z) : f = 0,..., t 0 } 

Output: Estimate of long-term causal effect CE(T) in Eq. ([Tj). 

1: By Assumption |3j define <fj = 4>j(Z ), ifj = f>j(Z). 

2: Set /ii 0 and po <— 0, both of size |A|; set z/ 0 = zq = 0. 

3 : for iter = 1, 2,... do 

4: For j = 0,1, sample <fj , ip 3 from prior, and sample f3j{ 0; Z) conditional on <fj. 

5: Calculate f3^ ] = pz/3i(0 ; Z) + (1 — pz)A)(0; Z). 

6: for j = 0,1 do 

7: Set/3,-(0;jl)=/3(°). 

8: Sample Bj = {/ 3j(t ; j 1) : t — 0,..., T} given and /3j(0, jl). #temporal model 

9: Sample atj(T;j 1) conditional on f3j(T;j 1). #behavioral model 

10: Set pj <— pj + P ( Dj\Bj , Gj) • R(aj(T] jl)). 

11: Set Uj <— z/j + P ( T>j\Bj , Gj ). 

12 : end for 

13 : end for 

14: Return estimate CE(T) = p\jv\ — po/vo- 


Theorem 1 (Estimation of long-term causal effects). Suppose that behaviors evolve ac¬ 
cording to a known temporal model, and actions are distributed conditionally on behaviors 
according to a known behavioral model. Suppose that Assumptions Si and [3] hold for 
such models. Then, for every policy j G {0,1} as the iterations of Algorithm^ 7] increase, 
Pj/vj —* E (_R(oj(T; jl))\Vj). The output CE(T) of Algorithm [f asymptotically estimates 
the long-term causal effect, i.e., 


E(CE(T)) = E (R(ai(T; 1)) - R(a 0 (T; 0))) = CE(T). 
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Remarks. Theorem |l| shows that CE(T) consistently estimates the long-term causal 
effect in Eq. ([Tj) . We note that it is also possible to derive the variance of this estimator 
with respect to the randomization distribution of assignment Z. To do so we first create a 
set of assignments Z by repeatedly sampling Z according to the experimental design. Then 
we adapt Algorithm [l] so that (i) Step 4 is removed; (ii) in Step 5, /3^ is sampled from its 
posterior distribution conditional on observed data, which can be obtained from the original 
Algorithm [T[ The empirical variance of the outputs over Z from the adapted algorithm 
estimates the variance of the output CE(T) of the original algorithm. We leave the full 
characterization of this variance estimation procedure for future work. 

As Theorem 1 relies on Assumptions [2] and [3j it is worth noting that the assumptions may 
be hard but not impossible to test in practice. For example, one idea to test Assumption [3] is 
to use data from multiple experiments on a single game j. If fitting the temporal model ([2]) on 
such data yields parameter estimates ((J)j(Z),'ipj(Z)) that depend on experimental assignment 
Z, then Assumption [3] would be unjustified. A similar test could be used for Assumption [2] 
as well. 


3.1 Discussion 

Methodologically, our approach is aligned with the idea that for long-term causal effects we 
need a model for outcomes that leverages structural information pertaining to how outcomes 
are generated and how they evolve. In our application such structural information is the 
microeconomic information that dictates what agent behaviors are successful in a given 
policy and how these behaviors evolve over time. 

In particular, Step 1 in the algorithm relies on Assumptions [2] and [3] to infer that model 
parameters, <pj^j are stable with respect to treatment assignment. Step 5 of the algorithm 
is the key estimation pivot, which uses Assumption [2] to extrapolate from the experimental 
assignment Z to the counterfactual assignments Z = 1 and Z = 0, as required in our 
problem. Having pivoted to such counterfactual assignment, it is then possible to use the 
temporal model parameters Vh, which are unaffected by the pivot under Assumption [3j to 
sample population behaviors up to long-term T, and subsequently sample agent actions at 
T (Steps 8 and 9). 

Thus, a lot of burden is placed on the behavioral game-theoretic model to predict agent 


actions, and the accuracy of such models is still not settled (Hahn et al, 2015). However, 
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it does not seem necessary that such prediction is completely accurate, but rather that the 
behavioral models can pull relevant information from data that would otherwise be inacces¬ 
sible without game theory, thereby improving over classical methods. A formal assessment 
of such improvement, e.g., using information theory, is open for future work. An empirical 


assessment can be supported by the extensive literature in behavioral game theory (Stahl 


and Wilson, 1994 McKelvey and Palfrey, 1995), which has been successful in predicting 


human actions in real-world experiments (Wright and Leyton-Brown, 2010). 


Another limitation of our approach is Assumption [lj which posits that there is a finite set 
of predefined behaviors. A nonparametric approach where behaviors are estimated on-the-fly 
might do better. In addition, the long-term horizon, T, also needs to be defined a priori. 
We should be careful how T interferes with the temporal model since such a model implies 
a time T' at which population behavior reaches stationarity. Thus if T' < T we implicitly 
assume that the long-term causal effect of interest pertains to a stationary regime (e.g., Nash 
equilibrium), but if T' > T we assume that the effect pertains to a transient regime, and 
therefore the policy evaluation might be misguided. 


4 Application on data from a behavioral experiment 


In this section, we apply our methodology to experimental data from Rapoport and Boebel 


(1992), as reported by McKelvey and Palfrey (1995). The experiment consisted of a series 


of zero-sum two-agent games, and aimed at examining the hypothesis that human players 
play according to minimax solutions of the game, the so-called minimax hypothesis initially 


suggested by Von Neumann and Morgenstern (1944). Here we repurpose the data in a 


slightly artificial way, including how we construct the designer’s objective. This enables a 
suitable demonstration of our approach. 

Each game in the experiment was a simultaneous-move game with five discrete actions 
for the row player and five actions for the column player. The structure of the payoff matrix, 
given in the supplement in Table 1, is parametrized by two values, namely W and L; the 
experiment used two different versions of payoff matrices, corresponding to payments by 
the row agent to the column agent when the row agent won (W), or lost (L): modulo 


a scaling factor Rapoport and Boebel (1992) used (W,L) = ($10, —$6) for game 0 and 
(W, L) = ($15, —$1) for game 1. 
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Forty agents, X = {1, 2,..., 40}, were randomized to one game design (20 agents per 
game), and each agent played once as row and once as column, matched against two different 
agents. Every match-up between a pair of agents lasted for two periods of 60 rounds, with 
each round consisting of a selection of an action from each agent and a payment. Thus, 

each agent played for four periods and 240 rounds in total. If Z is the entire assignment 

vector of length 40, Z{ = 1 means that agent i was assigned to game 1 with payoff matrix 

(IF, L ) = ($15, —$1) and Zi = 0 means that i was assigned to game 0 with payoff matrix 

(IF, L) = ($10, —$6). 

In adapting the data, we take advantage of the randomization in the experiment, and 
ask a question in regard to long-term causal effects. In particular, assuming that agents pay 
a fee for each action taken, which accounts for the revenue of the game, we ask the following 
question: 

What is the long-term causal effect on revenue if we switch from payoffs (IF, L ) = 

($10, —$6) of game 0 to payoffs (IF, L ) = ($15, —$1) of game 1?”. 

The games induced by the two aforementioned payoff matrices represent the two different 
policies we wish to compare. To evaluate our method, we consider the last period as long¬ 
term, and hold out data from this period. We define the causal estimand in Eq. ([!]) as 

CE = c T (ai(T;l)-a 0 (T;0)), (4) 

where T = 3 and c is a vector of coefficients. The interpretation is that, given an element c a 
of c, the agent playing action a is assumed to pay a constant fee c a . To check the robustness 
of our method we test Algorithm 1 over multiple values of c. 

4.1 Implementation of Algorithm [l] and results 

Here we demonstrate how Algorithm [I] can be applied to estimate the long-term causal effect 
in Eq. Q on the Rapoport & Boebel dataset. To this end we clarify Algorithm [I] step by 
step, and give more details in the supplement. 

Step 1: Model parameters. For simplicity we assume that the models in the two 
games share common parameters, and thus \i) = (4> o,V ; o,Ao) = (0, "0, A), where A 
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are the parameters of the behavioral model to be described in Step 8. Having common 
parameters also acts as regularization and thus helps estimation. 

Step 4: Sampling parameters and initial behaviors As explained later we assume 
that there are 3 different behaviors and thus 0, -0, A are vectors with 3 components. Let 
x ~ U(m,M) denote that every component of x is uniform on (m,M), independently. 
We choose diffuse priors for our parameters, specifically, 0 ~ U(0,10), 0 U( —5, 5), and 
A ~ U(—10,10). Given 0 we sample the initial behaviors as Dirichlet, i.e., 0i(O; Z) ~ Dir(0) 
and /3o(0; Z) ~ Dir(0), independently. 

Steps 5 &; 7: Pivot to counterfactuals. Since we have a completely randomized 
experiment (A/B test) it holds that pz = 0.5 and therefore 0® = O.5(0i(O; Z) + 0o(O; Z)). 
Now we can pivot to the counterfactual population behaviors under Z = 1 and Z = 0 by 
setting (0; 1) = 0 o (O; 0 ) = 0 (o) . 

Step 8: Sample counterfactual behavioral history. As the temporal model, we 
adopt the lag-one vector autoregressive model , also known as VAR(l). We transform^ the 
population behavior into a new variable w t = logit1)) G M 2 (also do so for 0 o (f;O)). 
Such transformation with a unique inverse is necessary because population behaviors are 


constrained on the simplex, and thus form so-called compositional data (Aitchison 1986 


Grunwald et al. 1993). The VAR(l) model implies that 


w t = ip[ 1]1 + i>[2\w t -i + 0[3]e t , 

where ip[k] is the kth component of 0 and et ~ W(0,/) is i.i.d. standard bivariate normal. 
Eq. (|6]) is used to sample the behavioral history, B :j . in Step 8 of Algorithm [TJ 

Step 9: Behavioral model. For the behavioral model, we adopt the quantal p-response 


(QL ) model (Stahl and Wilson, 1994), which has been successful in predicting human actions 


in real-world experiments (Wright and Leyton-Brown, 2010). We choose p — 3 behaviors, 


namely B = {6 0 , bi, ^ 2 } of increased sophistication parametrized by A = (A[1], A[2], A[3]) G M 3 . 
Let Gj denote the 5x5 payoff matrix of game j and let the term strategy denote a distribution 
over all actions. An agent with behavior 6 q plays the uniform strategy, 


P{Ai(t; Z) = a\Bi(t ; Z) = b 0 , Gj) = 1/5. 

1 y = logit(x) is defined as the function A m —> K m_1 , y[i\ = log(a;[* + l]/a;[l]), where x[V\ 0 0 wlog. 
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An agent of level-1 (row player) assumes to be playing only against level-0 agents and thus 
expects per-action profit u\ = (1/5) Gjl (for column player we use the transpose of G 3 ). The 
level-1 agent will then play a strategy proportional to e A W U1 , where e x for vector x denotes 
the element-wise exponentiation, e x = (e 3 ^). The precision parameter A[l] determines how 
much an agent insists on maximizing expected utility; for example, if A[1] = oo, the agent 
plays the action with maximum expected payoff (best response); if A[1] = 0, the agent acts 
as a level-0 agent. An agent of level-2 (row player) assumes to be playing only against level-1 
agents with precision A [2] and therefore expects to face strategy proportional to e A l 2 l ni . Thus 
its expected per-action profit is u 2 oc G 3 e x ^ u ' , and plays strategy oc e A ^“ 2 . 

Given Gj and A we calculate a 5 x 3 matrix Qj where the fcth column is the strat¬ 
egy played by an agent with behavior bk-i- The expected population action is there¬ 
fore Oij(t-Z) = Qj/3j(t;Z). The population action a 3 (t: Z) is distributed as a normalized 
multinomial random variable with expectation dtj(fr,Z), and so P(aj(t;l)\j3j(t;l),Gj) = 
Multi (|X[ • «j(f; 1); aj(t] 1)), where Multi (rt; p) is the multinomial density of observations 
n = (ni,..., 7ix) with probabilities p — (pi ,... ,Pk)- Hence, the full likelihood for observed 
actions in game j in Steps 10 and 11 of Algorithm [l] is given by the product 

T -1 

P(V 3 \ B 3 ,Gj) = Multi(|X| • aij(t]jl); aj(t;jl)). 

t =o 

Running Algorithm [l] on the Rapoport and Boebel dataset yields the estimates shown 
in Figure |2j for 25 different fee vectors c, where each component c a is sampled uniformly at 
random from (0,1). We also test difference-in-differences (DID), which estimates the causal 
effect through 

f did = [R( ai ( 2; Z)) - R(a t ( 0; Z))) - [R(a 0 ( 2; Z)) - R(a 0 { 0; Z))], 

and a naive method (“naive” in the plot), which ignores the dynamical aspect and estimates 
the long-term causal effect as f nai = [R(«i(2; Z)) — R(a 0 (2; Z))\. 

Our estimates (“LACE” in the plot) are closer to the truth (mse = 0.045) than the 
estimates from the naive method (mse = 0.185) and from DID (mse = 0.361). This illustrates 
that our method can pull game-theoretic information from the data for long-term causal 
inference, whereas the other methods cannot. 
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Figure 2: Estimates of long-term effects from different methods corresponding to 25 random 
objective coefficients c in Eq. Q. For estimates of our method we ran Algorithm [I] for 100 
iterations. 

5 Conclusion 

One critical shortcoming of statistical methods of causal inference is that they typically do 
not assess the long-term effect of policy changes. Here we combined causal inference and 
game theory to build a framework for estimation of such long-term effects in multiagent 
economies. Central to our approach is behavioral game theory, which provides a natural 
latent space model of how agents act and how their actions evolve over time. Such models 
enable to predict how agents would act under various policy assignments and at various time 
points, which is key for valid causal inference. Working on data from an actual behavioral 
experiment set we showed how our framework can be applied to estimate the long-term effect 
of changing the payoff structure of a normal-form game. 

Our framework could be extended in future work by incorporating learning (e.g., ficti¬ 
tious play, bandits, no-regret learning) to better model the dynamic response of multiagent 
systems to policy changes. Another interesting extension would be to use our framework for 
optimal design of experiments in such systems, which needs to account for heterogeneity in 
agent learning capabilities and for intrinsic dynamical properties of the systems’ responses 
to experimental treatments. 
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A Proof of Theorem CD 


Theorem 1 (Estimation of long-term causal effects). Suppose that behaviors evolve ac¬ 
cording to a known temporal model, and actions are distributed conditionally on behaviors 
according to a known behavioral model. Suppose that Assumptions 0 i and [3] hold for 
such models. Then, for every policy j G {0,1} as the iterations of Algorithm^ 7] increase, 
hj/vj —• y E (R(a.j(T\ jl))\Dj) . The output CE(T ) of Algorithm [f asymptotically estimates 
the long-term causal effect, i.e., 


E (CE(T)) = E 1)) - R{a 0 {T; 0))) = CE(T ). 


Proof. Fix a policy j in Algorithm [l]and drop the subscript j in the notation of the algorithm. 
Therefore we can write: 


uj = (oj. ifj, Bf) 
a = atj(T\j 1) 
PfD^^PfDjlB^Gj). 


The way Algorithm [l] is defined, as the iterations increase the variable p is estimating 


lim/i= / R(a)P(V\u)p(a,Lu)dujda. 


We now rewrite this integral as follows. 


lim/i= J R(a)P(V\co)p(a,uj)duda = j R(a)P('D\a,co)p(a,co)duda [ p(V\a,co) = P(V\uj) 

= / R(a)P(a, u[D)P{fD)dujda [by Bayes theorem] 


P(V) J R(a)P(a\V)da [oj is marginalized out 
P{V)E{R(a)\V). 


The first equation, p(V\a,uj) = P(T>\co), holds by definition of the behavioral model: the 
history of latent behaviors is sufficient for the likelihood of observed actions. Another way 
to phrase this is that conditional on latent behavior the observed action is independent from 
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any other variable. 

Similarly, as the iterations increase the variable v is estimating 

lim u — j P(V\cu)p(a,cu)duda. 

We now rewrite this integral as follows. 

lim u — j P(V\uj)p(a,uj)duda = j P(V\a,u>)p(a,ui)duda [because p(V\a,oj) = P(V\uj) 

= J P(a,u\V)P(T>)dujda [by Bayes theorem] 

= Pi'D) J P(a\V)da 
= pm. 

By the continuous mapping theorem we conclude that 

lim p/v —)■ E ( R(a)\T>) . 

Thus E(lim/x 1 /^ 1 ) = E (R(ai(T] 1 ))) and E (lim /U 0 /n 0 ) = E (R(a 0 (T; 0))) and so 
E (limpi/ui) — E (limp 0 /n 0 ) -)■ E (f?(«i(T; 1 ))) -E (f?(a 0 (T; 0))), 


i.e., Algorithm [I] consistently estimates the long-term causal effect. 


□ 


B Connection of assumptions to policy invariance 

Assumption [3] in our framework is related to policy invariance assumptions in econometrics 


of policy effects (Heckman and Vytlacil, 2005 Heckman et al, 1998). Intuitively, policy 


invariance posits that given the choice of policy by an agent, the initial process that resulted 
in this choice does not affect the outcome. For example, given that an individual chooses 
to participate in a tax benefit program, the way the individual was assigned to the program 
(e.g., lottery, recommendation, or point of a gun) does not alter the outcome that will 
be observed for that individual. Our assumption is different because we have a temporal 
evolution of population behavior and there is no free choice of an agent about the assignment, 
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since we assume a randomized experiment. But our assumption shares the essential aspect 
of conditional ignorability of assignment that is crucial in causal inference. 


C Discussion of related methods 


Consider the estimand for the Rapoport-Boebel experiment (Rapoport and Boebel, 1992): 


r = c T (aq(T; 1) - a 0 (T ; 0)). 


Here we discuss how standard methods would estimate such estimand. Our goal is to illus¬ 
trate the fundamental assumptions underpinning each method, and compare with our As¬ 
sumptions [2] and [3} To illustrate we will assume a specific value c = (0,1,0, 2, 0, 0,0, 0,1,1) T . 
In discussing these methods, we will mostly be concerned with how point estimates compare 
to the true value of the estimand, which here is r = $0,054 using the experimental data in 
Table 2. 

The naive approach would be to consider only the latest observed time point (to = 2) 
under the experiment assignment Z , and use the observed population actions under Z as an 
estimate for r; i.e., 


fnaive = c T( ai ( to; Z )) = -$0,051. 

But for this estimate to be unbiased for r, we generally require that 


Z) ~ a?o(hn Z) — a.\ (T; 1) — ao(T] 0). 


The naive estimate therefore makes a direct extrapolation from t = t 0 to t = T and from the 
observed assignment Z to the counterfactual assignments Z — 1 and Z — 0. This ignores, 
among other things, the dynamic nature of agent actions. 

A more sophisticated approach is to analyze the agent actions as a time series. For exam¬ 


ple, Brodersen et al. (2014) developed a method to estimate the effects of ad campaigns on 
website visits. Their method was based on the idea of “synthetic controls”, i.e., they created 
a time-series using different sources of information that would act as the counterfactual to the 
observed time-series after the intervention. However, their problem is macroeconometric and 
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they work with observational data. Thus, there is neither experimental randomized assign¬ 
ment to games, nor strategic interference between agents, nor dynamic agent actions. More 
crucially, they do not study long-term equilbrium effects. By construction, in our problem we 
can leverage behavioral game theory to make more informed predictions of counterfactuals 
to time points after the intervention at which the distribution of outcomes has stabilized. 

Another approach, common in econometrics, is the difference-in-differences (DID) esti¬ 
mator (Card and Krueger 1994 Donald and Lang 2007 Ostrovsky and Schwarz 201 1| ) . In 
our case, this method is not perfectly applicable because there are no observations before 
the intervention, but we can still entertain the idea by considering period t — 1 as the pre¬ 
intervention period. The DID estimator compares the difference in outcomes before and after 
the intervention for both the treated and control groups. In our application, this estimator 
takes the value 


r = c T («i(t 0 ; Z) - «i(l; Z)) - c T (a 0 (h); Z) - a 0 (l; Z)) = -$0,164. 


(5) 


change in revenue for game 2 


change in revenue for game 1 


This estimate is also far from the true value similar to the naive estimate. The DID estimator 


is unbiased for r only if there is an additive structure in the actions (Abadie, 2005), (Angrist 


and Pischke, 2008) (Section 5.2), e.g., atj(t; Z ) = fij + X t + e jt , where fij is a policy-specific 
parameter, X t is a temporal parameter, and e is noise. The DID estimator thus captures a 
linear trend in the data by assuming a common parameter for both treatment arms (A*) that 
is canceled out in subtraction in Eq. ([5]). The extent to which an additivity assumption is 
reasonable depends on the application, however, by definition, it implies ignorability of the 
assignment (i.e., Z does not appear in the model of Oj(t; Z)), and thus it relies on assumptions 


that are stronger than our assumptions (Abadie, 2005 Angrist and Pischke, 2008). 


In a structural approach, Athey et al. (2011) studied the effects of timber auction format 


(ascending versus sealed bid) on competition for timber tracts. They estimated bidder val¬ 
uations from observed data in one auction and imputed counterfactual bid distributions in 
the other auction, under the assumption of equilibrium play in both auctions. This approach 
makes two critical implicit assumptions that together are stronger than Assumption [3] First, 
the bidder valuation distribution is assumed to be a primitive that can be used to impute 
counterfactuals in other treatment assignments. In other words, the assignment is inde¬ 
pendent of bidder values, and thus it is strongly ignorable. Second, although imputation 
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is performed for potential outcomes in equilibrium, which captures the notion of long-term 
effects, inference is performed under the assumption of equilibrium play in the observed 
outcomes, and thus temporal dynamic behavior is assumed away. 

Finally, another popular approach to causality is through directed acyclical graphs (DAGs) 


between the variables of interest (Pearl, 2000). For example, Bottou et al. (2013) studied 
the causal effects of the machine learning algorithm that scores online ads in the Bing 
search engine on the search engine revenue. Their approach was to create a full DAG 
of the system including variables such as queries, bids, and prices, and made a Causal 
Markov assumption for the DAG. This allows to predict counterfactuals for the revenue 
under manipulations of the scoring algorithm, using only observed data generated from the 
assumed DAG. However, a key assumption of the DAG approach is that the underlying 
structural equation model is stable under the treatment assignment, and only edges coming 
from parents of the manipulated variable need to be removed; as before, assignment is 


considered strongly ignorable. As pointed out by Dash and Druzdzcl (2001) this might be 
implausible in equilibrium systems. Consider, for example, a system where X —>• Y Z, 
and a manipulation that sets the distribution of Y independently of X, Z. Then after 
manipulation the two edges will need to be removed. However, if in an equilibrium it is 
required that Y pc XZ, then the two arrows should be reversed after the manipulation. 
Proper causal inference in equilibrium systems through causal graphs remains an open area 


without a well-established methodology (Dash, 2005). 


Finally we note that there exists the concept of Granger causality (Granger, 1988), which 
remains important in econometrics. The central idea in Granger causality is predictability, 
in particular the ability of lagged iterates of a time series x(t) to predict future values of 
the outcome of interest, which in our case is the population action aq(£; Z). This causality 
concept does not take into account the randomization from the experimental design, which 
is key in statistical causal inference. 
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D Application: Rapoport and Boebel (1992) data 


The following tables report the payoff matrix structure (Table [I] used by Rapoport and 
Boebel) and the observed data (Table [2]), as reported by McKelvey and Palfrey (1995). 


Table 1: Normal-form game in the experiment of Rapoport and Boebel (values L and W are 


specified as described in the body of the paper) (Rapoport and Boebel, 1992). 



a[ 

a 2 

03 

a', 1 

«5 

d\ 

W 

L 

L 

L 

L 

0*2 

L 

L 

w 

W 

W 

a 3 

L 

W 

L 

L 

W 

CI4 

L 

W 

L 

W 

L 

CZ5 

L 

w 

W 

L 

L 


Table 2: Experimental data of Rapoport and Boebel Rapoport and Boebel (1992), as re¬ 
ported by McKelvey and Palfrey McKelvey and Palfrey (1995). The data includes frequency 
of actions for the row agent and the column agent in the experiment, broken down by 
game and session. Gray color indicates that we assume the data to be long-term and thus 
we hold them out of data analysis and only use them to measure predictive performance. 
(Note: There are five total actions available to every player according to the payoff struc¬ 
ture in Ta&/e[lJ The frequencies for actions a 5 ,a' 5 can be inferred because Y^i=i a i = 1 an( d 





row agent 



column agent 


Game 

Period 

«i 

d2 

«3 

04 


a 2 

«3 

0/4 

1 

1 

0.308 

0.307 

0.113 

0.120 

0.350 

0.218 

0.202 

0.092 

1 

2 

0.293 

0.272 

0.162 

0.100 

0.333 

0.177 

0.190 

01.40 

1 

3 

0.273 

0.350 

0.103 

0.123 

0.353 

0.133 

0.258 

0.102 

1 

4 


U .Zc)Z 

0.113 

0.135 

0.372 

0.192 

0.222 


2 

1 

0.258 

0.367 

0.105 

0.143 

0.332 

0.115 

0.245 

0.140 

2 

2 

0.290 

0.347 

0.118 

0.110 

0.355 

0.198 

0.208 

0.108 

2 

3 

0.355 

0.313 

0.082 

0.100 

0.355 

0.215 

0.187 

0.110 

2 

4 




0.105 

0.343 

0.243 

0.168 

0.107 


E More details on Bayesian computation 


Here we offer more details about the choices in implementing Algorithm [l] in Section 4.1 of 
the main paper. For convenience we repeat the content of Section 4.1 in the main paper and 
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then expand with our details. 

Step 1: Model parameters. For simplicity we assume that the models in the two games 
share common parameters, and thus (01,^1) Ai) — (^cpf/fnAo) = {(p,ip,X), where A are the 
parameters of the behavioral model to be described in Step 8. Having common parameters 
also acts as regularization and thus helps estimation. We emphasize that this simplification 
is not necessary as we could have two different set of parameters for each game. It is crucial, 
however, that the parameters are stable with respect to the treatment assignment because 
we need to extrapolate from the observed assignment to the counterfactual ones. 

Step 4: Sampling parameters and initial behaviors As explained later we assume 
that there are 3 different behaviors and thus <p, ip, X are vectors with 3 components. Let 
x ~ U(m,M) denote that every component of x is uniform on (m, M), independently. We 
choose diffuse priors for our parameters, specifically, (p ~ U(0,10), ip r\_/ U(—5,5), and A ~ 
U(—10,10). Given (p we sample the initial behaviors in the two games as /3i(0; Z) ~ Dir(0) 
and /3o(0; Z) ~ Dir(0), independently. 

Regarding the particular choices of these distributions, we first note that (p needs to have 
positive components because it is used as an argument to the Dirichlet distribution. Larger 
values than 10 could be used for the components of (p but the implied Dirichlet distributions 
would not differ significantly than the ones we use in our experiments. Regarding A we note 
that its components are used in quantities of the form e x ^ u and so it is reasonable to bound 
them, and the interval [—5, 5] is diffuse enough given the values of u implied by the payoff 
matrix in Table [l} Finally the prior for the temporal model parameters, ip, is also diffuse 
enough. An alternative would be to use a multivariate normal distribution as the prior for 
ip but this would not alter the procedure significantly. 

Steps 5 &: 7: Pivot to counterfactuals. Since we have a completely randomized 
experiment (A/B test) it holds that pz = 0.5 and therefore /3^ = 0.5(/3i(0; Z) + /3 0 (0; Z)). 
Now we can pivot to the counterfactual population behaviors under Z = 1 and Z = 0 by 
setting /3i(0; 1) = /3 0 (0; 0) = /3 (0) . 

Step 8: Sample counterfactual behavioral history. As the temporal model, we 
adopt the lag-one vector autoregressive model, also known as VAR(l). We transform^] the 
population behavior into a new variable w t = logit(/?!(t; 1)) G M 2 (also do so for /3 0 (t; 0)). 

2 The map y = logit(a;) is defined as the function A m —► R" 1-1 such that, for vectors y = (j/i,... ,y m - 1 ) 
and x = (xi ,..., x m ), J2i x i = L an< i / 0 wk>g, indicates that yi = \og{xi+\/x\), for i = 1,..., n — 1. 
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Such transformation with a unique inverse is necessary because population behaviors are 


constrained on the simplex, and thus form so-called compositional data (Aitchison 1986 


Grunwald et al. 1993). The VAR(l) model implies that 


w t = ip[ 1]1 + i>[2]w t -i + ip[ 3]e t , 


( 6 ) 


where is the hth component of ^ and et ~ is i.i.d. standard bivariate normal. 

Eq. ([6]) is used to sample the behavioral history, Bj, from t — 0 to t — T, as described in 
Step 8 of Algorithm [lj 

Such sampling is straightforward to do. We simply need to sample the random noises 
e t for every t G {0, ...,T}, and then compute each w t successively. Given the sample 
{w t : t = 0, ...,T} we can then transform back to calculate the population behaviors 
Pi(t] 1) = {logit -1 (w t ) : t — 0,..., T} —for B 0 we repeat the same procedure with a new 
sample of et since the two games share the same temporal model parameters i/j. 

Step 9: Behavioral model, ffere we rewrite the specifics of the behavioral model with 
more details. In QL p agents possess increasing levels of sophistication. Following earlier 


work Wright and Leyton-Brown (2010), we adopt p = 3, and thus consider a behavioral 


space with three different behaviors B = (6 0 , &i, £> 2 }- 

Recall that a behavior G B represents the distribution of actions that an agent will play 
conditional on adopting that behavior. I 11 QL p such distributions depend on an assumption 
of quantal response, which is defined as follows. Let u G denote a vector such that u a 
is the expected utility of an agent taking action a G A, and let Gj denote the payoff matrix 
in game j as in Table [I] If an agent is facing another agent with strategy (distribution 
over actions) b, then u = Gjb. The quantal best-response with parameter x determines the 
distribution of actions that the agent will take facing expected utilities u, and is defined as 


QBR (u;x) = expit (xw), 

where, for a vector y with elements y ll expit( 7 /) is a vector with elements exp(?/*)/ JA exp(?/;). 
The parameter x > 0 is called the precision of the quantal best-response. If x is very large 
then the response is closer to the classical Nash best-response, whereas if x = 0 the agent 
ignores the utilities and randomizes among actions. 

Let A = (A[1], A[2], A [3]) be the precision parameters. Let a(b) denote the distribution 
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over actions implied for an agent who adopts behavior. Given A the model QL 3 calculates 
a(bk), for k = 0,1, 2, as follows: 

• Agents who adopt b 0 , termed level-0 agents, have precision A 0 = 0, and thus will 
randomly pick one action from the action space A. Thus, 

a (bo) = QBR(u; 0) = (1/|«4|)1, 
regardless of the argument u. 

• An agent who adopts bi, termed level-1 agent, has precision A[l] and assumes that 
is playing against a level-0 type agent. Thus, the agent is facing a vector of utilities 

= Gjb 0 , and so 

a(6i) = QBR(m i; A[1]). 

• An agent who adopts b 2 , termed level-2 agent, has precision A[3] and assumes is playing 
against a level-1 agent with precision A [2], Thus, it estimates that it is facing strategy 
«( 1)2 = QBR(mi; A[2 ]), where Mi = Gjb 0 as above. The expected utility vector of the 
level-2 agent is u 2 = GjOi( i) 2 , and thus 

ot(b 2 ) = QBR(m 2 ; A[3]). 

Given Gj and A we can therefore write down a 5 x 3 matrix Qj = [a(bo),a(bi),a(b 2 )] 
where the kth column is the distribution over actions played by an agent conditional on 
adopting behavior b^-i- Conditional on population action Z) the expected population 
action is Z ) = Qj/3j(t ; Z). The population action aj(t; Z) is distributed as a multino¬ 
mial with expectation OLj(t] Z), and so P(aq(t; l)|/d,(f; 1), Gj) = Multi(|X| • o>j{t ; 1); 1)), 

where Multi(n,p) is the multinomial density of observations n = (ni, ..., uk) with expected 
frequencies p = (p \,... ,Pk)- Hence, the full likelihood for observed actions in game j re¬ 
quired in Steps 10 and 11 of Algorithm [l] is given by the product 

T—l 

/TP/ Bj.Xj.Gj) = JjMulti(|X| •a i (t;jl);d i (t;jl)). 
t =o 
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